Epidemiological Dynamics of the 2009 Influenza A(HlNl)v Outbreak in India 
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We analyze the time-series data for the onset of A(HlNl)v influenza pandemic in India during 
the period June 1- September 30, 2009. Using a variety of statistical fitting procedures, we obtain a 
robust estimate of the exponential growth rate (A) ~ 0.15. This corresponds to a basic reproductive 
number Ro — 1.45 for influenza A(HlNl)v in India, a value which lies towards the lower end of the 
range of values reported for different countries affected by the pandemic. 

PACS numbers: 05.45.Tp,87.19.xd,02.50.-r,87.23.Cc 



I. INTRODUCTION 

A novel influenza strain termed influenza A(HlNl)v, 
first identified in Mexico in March 2009, has rapidly 
spread to different countries and is currently thejpredom- 
inant influenza virus in circulation worldwide 111 2]. As 
of April 11, 2010, it has caused at least 17798 deaths in 
214 countries The first confirmed case in India, a pas- 
senger arriving from the USA, was detected on May 16, 
2009 in Hyderabad. The initial cases were passengers ar- 
riving by international flights. However, towards the end 
of July, the infections appeared to have spread into the 
resident population with an increasing number of cases 
being reported for people who had not been abroad. As 
of 11 April 2010, there have been 30352 laboratory con- 
firmed cases in India (out of 132796 tested) and 1472 
deaths have been reported, i.e., 5 % of the cases which 
tested positive for influenza A(HlNl)v [|J. 

To devise effective strategies for combating the spread 
of pandemic influenza A(H1N1), it is essential to estimate 
the transmissibility of this disease in a reliable manner. 
This is generally characterized by the reproductive num- 
ber R, defined as the average number of secondary in- 
fections resulting from a single (primary) infection. A 
special case is the basic reproductive number Rq, which 
is the value of R measured when the overall population 
is susceptible to the infection as is the case at the initial 
stage of an epidemic. Estimate of the basic reproduc- 
tion number for influenza A(HlNl)v in reports published 
from data obtained for different countries vary widely. 
For example, Rq has been variously estimated to be be- 
tween 2.2 to 3.0 for Mexico @, 1.72 for Mexico City 0, 
between 1.4 and 1.6 for La Gloria in Mexico 0], between 
1.3 to 1.7 for the United States Q and 2.4 for Victoria 
State in Australia @ . The divergence in the estimates for 
the basic reproductive number may be a result of under- 
reporting in the early stages of the epidemic or due to 
climatic variations. They may also possibly reflect the 
effect of different control strategies used in different re- 
gions, ranging from social distancing such as school clo- 
sures and confinement to antiviral treatments. 

In this paper, we estimate the basic reproductive num- 
ber for the infections using the time-series of infections 



in India extracted from reported data. By assuming an 
exponential rise in the number of infected cases I(t) dur- 
ing the initial stage of the epidemic when most of the 
population is susceptible, we can express the basic re- 
productive number as R = 1 + At (see, e.g., Ref. fTol ) . 
p. 19), where A is the rate of exponential growth in the 
number of infections, and t is the mean generation inter- 
val, which is approximately equal to 3 days @. Using the 
time-series data we obtain the slope A of the exponential 
growth using several different statistical techniques. Our 
results show that this quantity has a value of around 0.15, 
corresponding to Rq ~ 1.45. 



II. METHODS 

We used data from the daily situation updates avail- 
able from the website of the Ministry of Health and Fam- 
ily Welfare, Government of India [1JJ. In our analysis, 
data up to September 30, 2009 was used, correspond- 
ing to a total of 10078 positive cases. Note that, af- 
ter September 30, 2009, patients exhibiting mild flu like 
symptoms (classified as categories A and B) were no 
longer tested for the presence of the influenza A(H1N1) 
virus. 

As the data exhibit very large fluctuations, with some 
days not showing a single case while the following days 
show extremely large number of cases, it is necessary to 
smooth the data using a moving window average. We 
have used an n-day moving average (n = 2 — 10), which 
removes large fluctuations while remaining faithful to the 
overall trend. 



III. RESULTS 

The incidence data for the 2009 pandemic influenza 
data in India immediately reveals that the disease has 
been largely confined to the urban areas of the country. 
Indeed, 6 of the 7 largest metropolitan areas of India 
(which together accommodate about 5 % of the Indian 
population [l2|]) account for 7139 infected cases up to 
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FIG. 1: Time-series of the number of infected cases, ^Infected, of influenza A(HlNl)v showing the daily data (dotted) as 
well as the 5-day moving average (solid line) for India and the six metropolitan areas with the highest number of infections 
(whose geographic locations are shown in the adjoining map). The period shown is from June 1 to September 30, 2010. At 
the beginning of this period most of the infected people were arriving from abroad, while at the end of it the infection was 
entrenched in the local population. The data shows that almost all the cities showed a simultaneous increase in the number of 
infections towards the end of July and the beginning of August. This is manifested as a sudden rise in ^Infected for India as a 
whole (note the semilogarithmic scale), and can be taken as the period in which the infection started spreading in the resident 
population. 



September 30, 2009, i.e., 70.8 % of the data-set we have 
used. 

Figure [1] shows the daily number of confirmed infected 
cases, as well as, the 5-day moving average from June 1 
to September 30, 2009, for the country as a whole and the 
six major metropolitan areas which showed the highest 
incidence of the disease: Hyderabad, Delhi, Bangalore, 
Mumbai, Chennai and Pune. The adjoining map shows 
the geographic locations of these six cities. In the period 
up to July 2009, infections were largely reported in people 
arriving from abroad. There is a marked increase in the 
number of infections towards the end of July and the 
beginning of August 2009 in all of these cities (note that 
the ordinate is in logarithmic scale). This is manifest 
as a sudden rise in the number of infected cases for the 
country as a whole, implying that the infection started 
spreading in the resident population in the approximate 
period of 28 July to August 12. 

Figure [2] (a) shows the exponential slope A estimated 
in the following way. The time-series of the number of 
infections is first smoothed by taking a 5-day moving av- 
erage. The resulting smoothed time-series is then used to 
estimate A by a regression procedure applied to the log- 
arithm of the number of infected cases [log(#infected)] 



across a moving window of length At days. The origin of 
the window is varied across the period 1st June to 20th 
August (in steps of 1 day). We then repeat the procedure 
by varying the length of the window over the range of 7 
days to 36 days. To quantify the quality of regression we 
calculate the correlation coefficient r [Fig.[2](b)] between 
log (#Infected) and time (in days), and its measure of 
significance p [Fig. H](c)]. The correlation coefficient r is 
bounded between —1 and 1, with a value closer to 1 in- 
dicating a good fit of the data to an exponential increase 
in the number of infections. The measure of significance 
of the fitting is expressed by the corresponding p- value, 
which expresses the probability of obtaining the same 
correlation by random chance from uncorrelated data. 
The average of the estimated exponential slope A is ob- 
tained by taking the mean of all values of A obtained for 
windows originating between July 28- Aug 12 and of vari- 
ous sizes, for which the correlation coefficient r > r cuto ff 
(we consider 0.75 < r cuto ff < 1 in our analysis) and the 
measure of significance p < 0.01. For comparison, we 
show again in Figure [5] (d) the number of infected cases 
of H1N1 in India (dotted) together with its 5-day moving 
average (solid line) . The horizontal broken lines running 
across the figure indicate the period between July 28 and 
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FIG. 2: (a) The exponential slope A estimated from the time- 
series data of number of infected cases, ^Infected, averaged 
over a 5-day period to smoothen the fluctuations (d, solid 
curve). The slope A is calculated by considering the number 
of infected cases over a moving window having different sizes 
(At), ranging between 7 days and 36 days. By moving the 
starting point of the window across the period 1st June-20th 
August (in steps of 1 day) and calculating the best fit linear 
slope of the data on a semi-logarithmic scale (i.e., time in nor- 
mal axis, number of infections in logarithmic axis) we obtain 
an estimate of A. The arrow indicates the region between July 
28- August 12 (region within the broken lines), which shows 
the largest increase in number of infections within the period 
under study, corresponding to the period when the epidemic 
broke out in the resident population. Over this time-interval, 
the average of A is calculated for the set of starting dates 
and window sizes over which (b) the correlation coefficient r 
between log(^Infected) and t, is greater than r cuto ff (we con- 
sider 0.75 < r cutoff < 1 in our analysis) and (c), the measure 
of significance for the correlation p < 0.01. 



FIG. 3: Average slope (A) of the variation in log(#Infected) 
with time t, as a function of the threshold of correlation co- 
efficient, r cuto ff, used to filter the data. The averaging is 
performed for infections occurring within the period July 28- 
August 12 (for details see caption to Fig. [2j. Different sym- 
bols indicate the actual daily time-series data (squares) and 
the data smoothed over a moving n-day period, with n — 2 
(right-pointed triangle), 3 (diamond), 4 (inverted triangle), 
5 (circle) and 10 (triangle). The significance of the corre- 
lation between log(#Infected) with time t, p < 0.01 for all 
data points used in performing the average. Note that for 
n = 3, 4, 5 the data show very similar profiles for variation 
of (A) with r cuto ff, indicating the robustness of the estimate 
with respect to different values of n used. The sudden in- 
crease in the value of the average slope around r C utoff — 0.9 
implies that beyond this region the slope depends sensitively 
on the cutoff value. Considering the region where the vari- 
ation is more gradual gives us an approximate value of the 
slope A ~ 0.15, corresponding to a basic reproduction num- 
ber R ~ 1.45. 



August 12 which exhibited the highest increase in num- 
ber of infections within the period under study (from 1st 
June to 30th September) . 

Figure [3] shows the average exponential slope (A) as a 
function of r cuto ff, calculated for the original data and 
for different periods n over which the moving average is 
taken (n = 2, 3, 4, 5 and 10). For n = 3-5, the data show a 
similar profile indicating the robustness of the estimate of 
the average exponential slope (A) with respect to different 
values of n. The sudden increase in (A) around r cuto ff — 
0.9 implies that beyond this region the slope depends 
sensitively on the cutoff value. Considering the region 
where the variation is smoother gives an approximate 
value A ~ 0.15, corresponding to a basic reproductive 
number for the epidemic Rq = 1 + At ~ 1.45, assuming 
the mean generation interval, r = 3 days. 

We compute the confidence bounds for the estimate 
of Rq from the 5-day moving average time-series by us- 
ing the confint function of the scientific software MAT- 
LAB [13[. This function generates the goodness of fit 
statistics using the solution of the least squares fitting of 



log(#Infected) to a linear function. It results in a mean 
value (A) = 0.16, with the corresponding 95 % confidence 
intervals calculated as [0.116, 0.206], consistent with our 
previous estimate of Rq — 1.45. 

We have also used bootstrap methods to estimate the 
exponential slope, A. This involves selecting random 
samples with replacement from the data such that the 
sample size equals the size of the actual data-set. The 
same analysis that was performed on the empirical data 
is then repeated on each of these samples. The range 
of the estimated values A' calculated from the random 
samples allows determination of the uncertainty in esti- 
mation of A. Fig.@](a) shows the average, (A'), calculated 
for different periods (with abscissa indicating the start- 
ing date and the symbol indicating the duration of the 
period) from the 5-day moving average time-series data 
of infected cases. The curves corresponding to the peri- 
ods of different durations (14-16 days) intersect around 
July 31, 2010, indicating that the value of the average 
exponential slope is relatively robust with respect to the 
choice of the period about this date. The average value of 
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FIG. 4: (a) The averages of the bootstrap estimates for the 
exponential slope, A', calculated for different periods (with the 
abscissa indicating the starting date and the symbol indicat- 
ing the duration) from the 5-day moving average time-series 
data of infected cases in India. The curves corresponding 
to the periods of different durations (14-16 days) intersect 
around July 31, 2010, indicating that the value of the aver- 
age exponential slope is relatively robust with respect to the 
choice of the period about this date, (b) The distribution of 
bootstrap estimates of the exponential slope for the period 
July 31 to August 15, 2009. The average slope (A') obtained 
from 1000 bootstrap samples is 0.166 with a standard devi- 
ation of 0.024, which agrees with the approximate value of 
A = 0.15 (corresponding to Ro = 1.45) calculated in Fig. [3] 



the bootstrap estimates A' at the intersection of the three 
curves is 0.15, in agreement with our earlier calculations 
of A. 

Fig. S] (b) shows the distribution of the bootstrap es- 
timates of the exponential slope for a particular period, 
July 31 to August 15, 2009. The average slope (A') ob- 
tained from 1000 bootstrap samples for this period is 
0.166 with a standard deviation of 0.024, which indicates 
that the spread of values around the average estimate of 
(A') = 0.15 is not large. This confirms the reliability of 
the estimated value of the exponential slope, and hence 
of our calculation of the basic reproductive number. 



IV. DISCUSSION 

It may appear surprising that there was a very high 
number of infections in Pune (1238 positive cases up to 



September 30), despite it being less well-connected to the 
other major metropolitan cities of India, in comparison 
to urban centres that did not show a high incidence of 
the disease. For example, the Kolkata metropolitan area, 
which has a population around three times the popula- 
tion of the Pune metropolitan area [12|], had only 113 
positive cases up to September 30. This could possibly 
reflect the role of local climatic conditions: Pune, located 
at a relatively higher altitude, has a generally cooler cli- 
mate than most Indian cities. In addition, the close prox- 
imity of Pune to Mumbai and the high volume of road 
traffic between these two cities could have helped in the 
transmission of the disease. Another feature pointing to 
the role of local climate is the fact that in Chennai, most 
infected cases were visitors from outside the city, while in 
Pune, the majority of the cases were from the local pop- 
ulation, even though the total number of infected cases 
listed for the two cities in our data-set are comparable 
(928 in Chennai and 1213 in Pune). This suggests the 
possibility that the incidence of the disease in Pune could 
have been aided by its cool climate, in contrast to the 
hotter climate of the coastal city of Chennai. 

The calculation of Ro for India assumes well-mixing 
of the population (i.e., homogeneity of the contact struc- 
ture) among the major cities in India. Given the rapidity 
of travel between the different metropolitan areas via air 
and rail, this may not be an unreasonable assumption. 
However, some local variation in the development of the 
epidemic in different regions can indeed be seen (Fig. [T]) 
. Around the end of July, almost all the cities under in- 
vestigation showed a marked increase in the number of 
infected cases - indicating spread of the epidemic in the 
local population. This justifies our assumption of well- 
mixing in the urban population over the entire country 
for calculating the basic reproductive number. 

To conclude, we stress the implications of our find- 
ing that the basic reproductive number for pandemic in- 
fluenza A(HlNl)v in India lies towards the lower end of 
the values reported for other affected countries. This sug- 
gests that season-to-season and country-to-country vari- 
ations need to be taken into account in order to formulate 
strategies for countering the spread of the disease. Evalu- 
ation of the reproductive number, once control measures 
have been initiated, is vital in determining the future 
pattern of spread of the disease. 
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